Regional Bias in the Broad Phonetic Transcriptions of the Spoken Dutch Corpus

نویسندگان

  • Evie Coussé
  • Steven Gillis
چکیده

In this paper, we assess an aspect of the quality of the broad phonetic transcriptions in the Spoken Dutch Corpus (CGN). The corpus contains speech from native speakers of Dutch originating from The Netherlands and the Dutch speaking part of Belgium. The phonetic transcriptions were made by transcribers from both regions. In previous research, we have identified regional differences in the transcribers' behaviour. In this paper, we explore the precise sources of the regional bias in the CGN transcriptions and we evaluate its impact on the phonetic transcriptions. More specifically, (1) the regional bias in the canonical transcriptions that served as the basis for the verification task of the transcribers is critically analysed, and (2) we verify in an experiment the regional bias introduced by the transcribers themselves. The possible effects of this inherent regional bias in the CGN transcriptions on subsequent linguistic analyses are briefly discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Title : Automatic Phonetic Transcription of Large Speech Corpora

Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical ‘canonical’ phonetic transcription from the orthography. In addition, time and money permitting, some speech corpora are provided with a manually verified broad phonetic transcription of at least ...

متن کامل

The Influence of the Labeller's Regional Background on Phonetic Transcriptions: Implications for the Evaluation of Spoken Language Resources

Phonetic transcriptions of spoken language corpora are not an exact written reproduction of the speech signal. They are influenced by a variety of factors such as the transcriber s native categorical perception. What remains unexplored is to what extent variation of perception within the same language exerts any influence on phonetic transcriptions. We report a case study of the labelling of vo...

متن کامل

Assessing Manually Corrected Broad Phonetic Transcriptions in the Spoken Dutch Corpus

For research and development purposes in the areas of phonetics and speech technology, phonetically transcribed speech may be of great value. In the near future, the Spoken Dutch Corpus (CGN) is going to offer such transcriptions for about one thousand hours of spoken Dutch, of which 90% will consist of automatic transcriptions and 10% of manually produced transcriptions. An advantage of automa...

متن کامل

Automatic phonetic transcription of large speech corpora

This study is aimed at investigating whether automatic phonetic transcription procedures can approximate manual transcriptions typically delivered with contemporary large speech corpora. To this end, ten automatic procedures were used to generate a broad phonetic transcription of well-prepared speech (read-aloud texts) and spontaneous speech (telephone dialogues) from the Spoken Dutch Corpus. T...

متن کامل

German Today: a really extensive Corpus of Spoken Standard German

The research project “German Today” aims to determine the amount of regional variation in (near-)standard German spoken by young and older educated adults and to identify and locate regional features. To this end, we compile an areally extensive corpus of read and spontaneous German speech. Secondary school students and 50-to-60-year-old locals are recorded in 160 cities throughout the German s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006